Cost-sensitive Decision Trees with Post-pruning and Competition for Numeric Data

نویسندگان

  • Zilong XU
  • Fan MIN
  • William ZHU
چکیده

Decision tree is an effective classification approach in data mining and machine learning. In some applications, test costs and misclassification costs should be considered while inducing decision trees. Recently, some cost-sensitive learning algorithms based on ID3, such as CS-ID3, IDX, ICET and λ-ID3, have been proposed to deal with the issue. In this paper, we develop a decision tree algorithm inspired by C4.5 with post-pruning and competition for numeric data. The test cost weighted information gain ratio serves as the heuristic information while building the tree. The focus of the algorithm is the postpruning technique which considers the tradeoff between test costs and misclassification costs. In order to obtain even better results, we employ the competition approach to construct a forest and select the best tree. Experimental results indicate the effectiveness of the heuristic function, the efficiency of the post-pruning technique, and the availability of the competition approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cost-sensitive C4.5 with post-pruning and competition

Decision tree is an effective classification approach in data mining and machine learning. In applications, test costs and misclassification costs should be considered while inducing decision trees. Recently, some cost-sensitive learning algorithms based on ID3 such as CS-ID3, IDX, λ-ID3 have been proposed to deal with the issue. These algorithms deal with only symbolic data. In this paper, we ...

متن کامل

A Competition Strategy to Cost-Sensitive Decision Trees

Learning from data with test cost and misclassification cost has been a hot topic in data mining. Many algorithms have been proposed to induce decision trees for this purpose. This paper studies a number of such algorithms and presents a competition strategy to obtain trees with lower cost. First, we generate a population of decision trees using λ-ID3 and EG2 algorithms through considering info...

متن کامل

Cost-Sensitive Decision Trees with Pre-pruning

This paper explores two simple and efficient pre-pruning strategies for the cost-sensitive decision tree algorithm to avoid overfitting. One is to limit the cost-sensitive decision trees to a depth of two. The other is to prune the trees with a pre-specified threshold. Empirical study shows that, compared to the error-based tree algorithm C4.5 and several other cost-sensitive tree algorithms, t...

متن کامل

Induction of decision trees in numeric domains using set-valued attributes

Abstract Conventional algorithms for decision tree induction use an attribute-value representation scheme for instances. This paper explores the empirical consequences of using set-valued attributes. This simple representational extension, when used as a pre-processor for numeric data, is shown to yield significant gains in accuracy combined with attractive build times. It is also shown to impr...

متن کامل

CC4.5: cost-sensitive decision tree pruning

There are many methods to prune decision trees, but the idea of cost-sensitive pruning has received much less investigation even though additional flexibility and increased performance can be obtained from this method. In this paper, we introduce a cost-sensitive decision tree pruning algorithm called CC4.5 based on the C4.5 algorithm. This algorithm uses the same method as C4.5 to construct th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013